Quantum procedures for approximate optimization by quenching

ABSTRACT

In this disclosure, example quantum algorithms for approximate optimization based on a sudden quench of a Hamiltonian. While the algorithm is general, it is analyzed in this disclosure in the specific context of MAX-EK-LIN2, for both even and odd K. It is to be understood, however, that the algorithm can be generalized to other contexts. A duality can be found: roughly, either the algorithm provides some nontrivial improvement over random or there exist many solutions which are significantly worse than random. A classical approximation algorithm is then analyzed and a similar duality is found, though the quantum algorithm provides additional guarantees in certain cases.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of U.S. Provisional Application No. 62/782,254 entitled “QUANTUM PROCEDURES FOR APPROXIMATE OPTIMIZATION BY QUENCHING” filed on Dec. 19, 2018, which is hereby incorporated herein by reference in its entirety.

SUMMARY

In this disclosure, example quantum algorithms for approximate optimization based on a sudden quench of a Hamiltonian. While the algorithm is general, it is analyzed in this disclosure in the specific context of MAX-EK-LIN2, for both even and odd K. It is to be understood, however, that the algorithm can be generalized to other contexts. A duality can be found: roughly, either the algorithm provides some nontrivial improvement over random or there exist many solutions which are significantly worse than random. A classical approximation algorithm is then analyzed and a similar duality is found, though the quantum algorithm provides additional guarantees in certain cases.

The embodiments disclosed herein include example methods for performing an approximate optimization technique using a quantum quench algorithm. In one embodiment, for instance, a quantum computing device is configured to perform an approximate optimization technique to approximate a solution to a combinatorial optimization problem. The approximate optimization technique is then performed on the quantum computing device. In this embodiment, the approximate optimization technique includes using a quantum quench algorithm. In certain implementations, the quench algorithm includes averaging state values over a plurality of times. In particular implementations, the controlling comprises changing coupling constants without using a jump or slow change in the coupling constants. In some examples, the changing of the coupling constants is performed non-adiabatically. In certain examples, the changing of the coupling constants is followed by an equilibration time. In some implementations, the method further comprises reading out results of the approximate optimization technique from the quantum computing device; and storing the results in a classical computing device.

Also disclosed are example embodiments for performing a quantum quench algorithm on a classical computing device using simulation of a quantum Hamiltonian to perform approximate optimization.

The foregoing and other objects, features, and advantages of the disclosed technology will become more apparent from the following detailed description, which proceeds with reference to the accompanying figures.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates a generalized example of a suitable classical computing environment in which several of the described embodiments can be implemented.

FIG. 2 is an example of a possible network topology (e.g., a client-server network) for implementing a system according to the disclosed technology.

FIG. 3 is another example of a possible network topology (e.g., a distributed computing environment) for implementing a system according to the disclosed technology.

FIG. 4 is an exemplary system for implementing the disclosed technology.

FIG. 5 is an example method for performing an approximate optimization technique using a quantum quench algorithm as disclosed herein.

FIG. 6 is another example method for performing an approximate optimization technique using a quantum quench algorithm as disclosed herein.

DETAILED DESCRIPTION I. General Considerations

As used in this application, the singular forms “a,” “an,” and “the” include the plural forms unless the context clearly dictates otherwise. Additionally, the term “includes” means “comprises.” Further, the term “coupled” does not exclude the presence of intermediate elements between the coupled items. Further, as used herein, the term “and/or” means any one item or combination of any items in the phrase.

Although the operations of some of the disclosed methods are described in a particular, sequential order for convenient presentation, it should be understood that this manner of description encompasses rearrangement, unless a particular ordering is required by specific language set forth below. For example, operations described sequentially may in some cases be rearranged or performed concurrently. Moreover, for the sake of simplicity, the attached figures may not show the various ways in which the disclosed systems, methods, and apparatus can be used in conjunction with other systems, methods, and apparatus. Additionally, the description sometimes uses terms like “produce” and “provide” to describe the disclosed methods. These terms are high-level abstractions of the actual operations that are performed. The actual operations that correspond to these terms will vary depending on the particular implementation and are readily discernible by one of ordinary skill in the art.

II. Introduction

For many combinatorial optimization problems, one expects that it is not possible to obtain an exact solution in polynomial time. Instead, the best that one can hope for is to obtain an approximate solution. In this disclosure, a general approach is presented to constructing quantum approximation algorithms based on the idea of a quench: a sudden change in the Hamiltonian. Further, a duality is analyzed that one can call “pretty good or very bad”, in which the algorithm either finds a nontrivial improvement over random (termed “pretty good”) or there exist many solutions which are significantly worse (termed “very bad”). A similar duality in the context of a classical approximation algorithm is also analyzed.

In this disclosure, the optimization problem MAX-EK-LIN-2 is considered, with the assumption of a degree bound explained below. Roughly speaking, this problem MAX-≤K-LIN-2 considers an objective function which is a sum of terms of degree K; a more precise definition is given later. An instance is considered with degree D, so that each degree of freedom participates in D terms in the problem. Previous work has shown that for odd K it is possible to obtain a nontrivial approximation of order 1/√{square root over (D)} for MAX-EK-LIN-2 using a classical algorithm (see. e.g., B. Barak, A. Moitra, R. O'Donnell, P. Raghavendra, O. Regev, D. Sterner, L. Trevisan, A. Vijayaraghavan, D. Witmer, and J. Wright, arXiv preprint arXiv:1505.03424 (2015)) (initially a quantum algorithm was found providing weaker approximation guarantees but later the classical algorithm was discovered (see, e.g., E. Farhi, J. Goldstone, and S. Gutmann, arXiv preprint arXiv:1412.6062 2, 13)). but later a classical algorithm was discovered with better approximation guarantees). Further, for arbitrary K, the classical algorithm finds a solution which is either better than random by an amount 1/√{square root over (D)} or worse by an amount of order 1/√{square root over (D)}. (This result for K odd implies the order 1/√{square root over (D)} improvement since if the algorithm find a result result than random by order 1/√{square root over (D)}, one can change the sign of all variables an obtain an improvement by order 1/√{square root over (D)}.

Also considered is a slightly different but related classical approximation algorithm where it was found (for arbitrary K, though the result is most interesting for even K) that the duality above generalizes to this: rather than being better or worse by 1/√{square root over (D)}, one can instead choose it to be slightly better or much worse. In this case, if the instance does not have a solution which is much worse, then the algorithm is guaranteed to find a slight improvement.

Embodiments of the disclosed technology also a quantum algorithm based on quenches. Rather than slowly changing a Hamiltonian as in the adiabatic algorithm (see, e.g., E. Farhi, J. Goldstone, S. Gutmann, J. Lapan, A. Lundgren, and D. Preda, Science 292, 472 (2001) which in general is expected to have trouble with small energy gaps (see. e.g., B. Altshuler, H. Krovi, and J. Roland, Proceedings of the National Academy of Sciences 107, 12446 (2010))), one can suddenly change the Hamiltonian, but then spend some time evolving under the new Hamiltonian. In accordance with certain embodiments, a general method for approximate optimization is disclosed. In particular examples, the general method is analyzed in the context of MX-EK-LIN-2. Here, a similar duality is found but with slightly different requirements: in some ways, the quantum algorithm requires only weaker assumptions in order to find a nontrivial improvement.

A. Problem Definition and Examples

Consider the problem MAX-EK-LIN-2. There are N degrees of freedom, each of may take values in {−1, +1}. The objective function, which one can denote as H_(Z), is taken to be a weighted sum of monomials of degree at least 1 and at most K in these variables, e.g., each monomial is a product of at least 1 and at most K distinct variables. One can require that the weight of each monomial be chosen from {−1, +1}, and that all monomials be distinct from each other.

Consider an optimization problem where the goal is to maximize this objective function. One can emphasize this since a Hamiltonian which includes a term proportional to H_(Z), will be later considered and states near the highest energy state of that Hamiltonian, rather than the lowest energy state as more commonly done in physics, will also be considered.

One can write the variables as Z_(i) when i ∈{1 . . . N} so that there are N variables, so for MAX-E2-LIN-2 one has

$\begin{matrix} {{H_{Z} = {\sum\limits_{i < j}{J_{ij}Z_{i}Z_{j}}}},} & (1) \end{matrix}$

where J_(ij) is a matrix with entries chosen from {−1, 0, +1}.

A degree bound D will be assumed, so that each variable Z_(i) appears in at most D distinct monomials in H_(Z). Indeed, for simplicity only the case where each Z_(i) appears in exactly D monomials in H_(Z) will be considered. One can define N_(T) to equal the number of terms in H_(Z) so that if every term has degree exactly D (e.g., for MAX-EK-LIN-2) one has

${N_{T} = \frac{DN}{K}},$

and for MAX-K-LIN-2 one has

$\begin{matrix} {{DN} \geq N_{T} \geq {\frac{DN}{K}.}} & (2) \end{matrix}$

A random assignment has expectation value of H_(Z) equal to 0. Typically in computer science, one regards each of these monomials as a constraint: the constraint is satisfied if the monomial is equal to +1 and it is violated otherwise, so that the number of satisfied coustraints is equal to the value of H_(Z) plus N_(T)/2. Hence, random assignment satisfies half the constraints on average. Then, the approximation ratio achieved by some assignment to the variables is defined to be the fraction of constraints satisfied by that assignment divided by the fraction of constraints satisfied by the optimal assignment.

One can define the approximation ratio differently: define it to be the value of H_(Z) for a given assignment divided by the value of H_(Z) in the optimal assignment. That is, this term N_(T)/2 will not be added.

In certain examples, it is said that an assignment improves by a factor ƒ over random if it has H_(Z)≥ƒN_(T). Further, in certain examples, it is said that an algorithm is worse than random by a factor ƒ if it has H_(Z)≤−ƒN_(T).

It is known that it is always possible to obtain an Ω(1/D) approximation ratio with a polynomial time algorithm for MAX-K-LIN-2. (see. e.g., J. Hastad, Information Processing Letters 74, 1 (2000)). More strongly, for any instance it is always possible to find an assignment which improves by a factor 1/D over random in polynomial expected time. For odd K, however, it is possible to improve over a random assignment by exp(−O(K))/√{square root over (D)} in polynomial time. (See, e.g., B. Barak, A. Moitra, R. O'Donnell, P. Raghavendra, O. Regev, D. Sleurer, L. Trevisan, A. Vijayaraghavan, D. Witmer, and J. Wright, arXiv preprint arXiv:1505.03424 (2015).)

One cannot expect to have such an improvement for even K simply because there exist families of instances in which no assignment has H_(Z) larger than N_(T)·

(1/D). For K=2, a simple such example is to choose

$\begin{matrix} {H_{Z} = {- {\sum\limits_{i < j}{Z_{i}{Z_{j}.}}}}} & (3) \end{matrix}$

Here one has taken D=N=1 so that every variable is in some monomial with another variable. It is possible to obtain a very large negative expectation value of H_(Z) (e.g., −N(N−1)/2) by choosing all Z_(i) to have the same sign, but for even N, the maximum positive expectation value of H_(Z) is to choose N/2 tat the Z_(i) to equal +1 and the remainder to equal −1, giving expectation value N/2, which is proportional to N_(T)/N. This example provides an early example of the duality: the maximum improvement over random is quite small (O(1/D)) but one can find an assignment is a factor Ω(1) worse than random.

For K=2 m, one can generalize example (3) to give an instance for MAX-EK-LIN-2 as follows: let N=mD. Divide the set of mD variables into D disjoint sets, each containing m variables. Label the sets by integers in 1, . . . , D. Let {tilde over (Z)}_(i) be the product of the variables in the i-th set. Let H_(Z)=−Σ_(i<j){tilde over (Z)}_(i){tilde over (Z)}_(j).

B. Outline, Notation, and Results

In section III, an example embodiment of the quench algorithm is defined, both in the specific form that is analyzed later as well as some variants that may be useful. In section IV, some results are collected that will be useful in analyzing the classical algorithm that are given later as well as in analyzing the quantum algorithm. In section V, the classical algorithm is defined and analyzed; this section will give some notation and probabilistic results that will be used in the quantum algorithm. In section VI, the example embodiment of the quantum algorithm is analyzed.

III. Quench Algorithm

To define the algorithm, one can promote the variables to qubits, and let Z_(i) be the Pauli Z operator on the i-th qubit. Let X_(i) be the Pauli N operator on the i-th qubit and let

$\begin{matrix} {X = {\sum\limits_{i}{X_{i}.}}} & (4) \end{matrix}$

The following algorithm can then be used. Let

$\begin{matrix} {{H = {X + {\frac{\alpha}{D}H_{Z}}}},} & (5) \end{matrix}$

where α is a scalar chosen later. The system can be prepared in the state ψ₊ maximally polarized in the + direction so that X₁=+1 for all i. One can then evolve the system under Hamiltonian H for a time T that will be explained later. This time will in all cases be at most poly(N); indeed, the analysis will be for T=O(1). Hence, this evolution can be performed in polynomial time on a quantum computer in time polynomial in t_(max) and polynomial in the inverse error (see, e.g., D. W. Berry, A. M. Childs, and R. Kothari, 2015 IEEE Symposium on Foundations of Computer Science, 792 (2015)); indeed, the simulation can be performed in time polylogarithmic in the inverse error but this will not be needed. In any simulation algorithm on a quantum computer, one can discretize the variable t; for example, one may choose it to equal an integer multiple of some time t_(min) for some t_(min) which is polynomially small; this causes only a polynomially small error. Finally, one can measure the state of the system in the computational basis, giving an assignment of variables Z_(i).

In the analysis of the algorithm, one can ignore all the errors associated with the time evolution and the discretization of time, since a polynomially small error is negligible as may be verified.

When one applies this algorithm, one can often repeat the algorithm several times with T chosen from an appropriate distribution as described in section VI, repeating the H_(Z) measurement each time. In this regard, it is interesting to think about the state arising from averaging T over an interval of times; by choosing the time from a random distribution (or more generally, by performing phase estimation of the Hamiltonian H) one can decohere the system in an eigenbasis. The fixed evolution has a similar effect but is easier to analyze using the techniques here. Also, it may be useful to consider a generalization of the algorithm in which one does some slow (but not necessarily adiabatic) evolution of the Hamiltonian from an initial Hamiltonian X to H=X+(α/D)H_(Z), followed by an additional time evolving under H=X+(α/D)H_(Z).

A. Motivation

In this section, the algorithm is heuristically explained. The time evolution has two purposes. The first is to decohere different eigenstates of the Hamiltonian as mentioned; for a fixed time, the evolution still for time t still produces a pure state, but still produces some change in phase for different energies which has a similar effect to a random evolution. The second purpose is to do it in a way that conserves energy. One hopes that the decoherence between different eigenstates will lead to a reduction in the expectation value of X, since one hopes that individual eigenstates will not have large X. This reduction will lead to a positive expectation value of H_(Z) due to the energy conservation as will now be explained: this energy conservation is the second reason for the time evolution.

For arbitrary operators O, H and scalar t, define τ_(t) ^(H)(O)=exp(itH)O exp(−itH). Define

O

₊≡

ψ₊|O|ψ₊

.   (6)

One has

τ_(t) ^(H)(H)

₊=

H

₊=N,   (7)

Independent of T by energy conservation. Hence, one has

$\begin{matrix} {{\langle{\tau_{T}^{H}\left( H_{Z} \right)}\rangle} = {D{\frac{N - {\langle{\tau_{T}^{H}(X)}\rangle}_{+}}{\alpha}.}}} & (8) \end{matrix}$

That is, if the state at time T has an expectation value of X that is smaller than the maximal (e.g. smaller than N), it necessarily has an expectation value of H_(Z) that is positive. In other words, it has obtained some solution that is better than random. This is one of the notable aspects behind the quench algorithm.

B. Heuristic Choices of α

This section discusses now to choose α. A calculation is given that introduces some of the notation used later. Perturbation theory to only a second order is considered, and a purely heuristic treatment of higher orders to motivate the choice of α is given. Later, a different treatment is given.

Consider the series for τ_(t) ^(H)(X_(i)) for some given i: for any operator O, one has the series

$\begin{matrix} {{\tau_{T}^{H}(O)} = {O - {{iT}\left\lbrack {O,H} \right\rbrack} - {\frac{T^{2}}{2}\left\lbrack {\left\lbrack {O,H} \right\rbrack,H} \right\rbrack} + {i{\frac{T^{3}}{3!}\left\lbrack {\left\lbrack {\left\lbrack {O,H} \right\rbrack,H} \right\rbrack,H} \right\rbrack}} + {\frac{T^{4}}{4!}\left\lbrack {\left\lbrack {\left\lbrack {\left\lbrack {O,H} \right\rbrack,H} \right\rbrack,H} \right\rbrack,H} \right\rbrack} + \ldots}} & (9) \end{matrix}$

So, one has

$\begin{matrix} {{{\tau_{T}^{H}(X)} = {X - {i{\frac{\alpha \; T}{D}\left\lbrack {X,H_{Z}} \right\rbrack}} - {\frac{\alpha^{2}}{D^{2}}{\frac{T^{2}}{2}\left\lbrack {\left\lbrack {X,H_{Z}} \right\rbrack,H} \right\rbrack}} + \ldots}}\;,} & (10) \end{matrix}$

where the dots denote terms of order T³ or higher. Hence,

$\begin{matrix} \begin{matrix} {{\langle X\rangle} = {{\langle{X - {\frac{\alpha^{2}}{D^{2}}{\frac{T^{2}}{2}\left\lbrack {\left\lbrack {X,H_{Z}} \right\rbrack,H} \right\rbrack}}}\rangle}_{+} + \ldots}} \\ {{= {N - {\frac{\alpha^{2}}{D^{2}}\frac{T^{2}}{2}N} + \ldots}}\;,} \end{matrix} & (11) \end{matrix}$

where the following is use

[[X, H_(Z)], H]

₊=Σ_(i)

[[X_(i), H_(Z)], H_(Z)]

₊=ND. and so

$\begin{matrix} {{\langle H_{Z}\rangle} = {{\frac{\alpha \; T^{2}}{2}N} + \ldots}} & (12) \end{matrix}$

Of course, the higher order corrections to this perturbation theory must become important for large enough T, α. For one thing, once T≲1, the effects of higher order terms in TX in the exponential become important, e.g., one must consider higher order commutators such as [[[[H_(Z)], X], X], H_(Z)]. However, one might hope that for some T of order unity (for example T=½) the ignore higher orders in TX will not be too important; maybe they will not be negligible but one can hope that they will only slightly reduce the result.

However, even for such a fixed T=½, one cannot ignore higher order terms in (α/D)H_(Z) for large enough α. For example, if α is sufficiently larger than √{square root over (D)}, one would find that Eq. (11) gives a result for X which is smaller than −N, which is impossible.

So, the most optimistic outcome that one can hope for is that second order perturbation theory is roughly accurate up to some T of order unity such as T=½ and up to α proportional to √{square root over (D)}. If so, one would find that the best choice of α would be to take α proportional to √{square root over (D)}, in which case one would have

H_(Z)

proportional to N√{square root over (D)}, which is proportional to N_(T)·Ω(√{square root over (D)}).

However, this heuristic analysis is likely too optimistic. Such solutions do exist for MAX-EK-LIN-2 for odd K (though it has not been shown that the algorithm finds them), but they do not exist in general, such as the example of (3).

IV. Rounding

In this section, some general results are given on how, given a solution to an optimization problem for a polynomial in several vectorial variables, one can construct a solution to the same problem where all variables are chosen to be the same. Theorem 1 is the main result. This result will be used in both the classical and quantum algorithms; the vectors {right arrow over (w)}_(a) are the solution to the problem using several vectorial variables. while the {right arrow over (u)} is the solution with all variables the same.

The way one can use these theorems is as follows. H_(Z) as a degree-K polynomial in variables Z_(i) is given. Each Z_(i) is chosen from {−1, +1}. Let {right arrow over (Z)} be a vector of choices of variables Z. One can write H_(Z)({right arrow over (Z)}) to denote the value of H_(Z) for that given set of choices.

One can randomly round choices of Z_(i) from the interval [−1, +1] to choices of Z_(i) from the discrete set Z_(i)={−1, +1} while preserving expectation value. Formally, consider a vectorial variable {right arrow over (v)} with each entry is chosen from the interval [−1, −1]. Then, independently choosing each Z_(i) at random from {−1, +1}, picking the probability for each Z_(i) so that

[Z_(i)]=v_(i), one has

[H_(Z)({right arrow over (Z)})]=H_(Z)({right arrow over (v)}).

Now, let one define a polynomial H_(Z)({right arrow over (v)}₁, {right arrow over (v)}₂, . . . , {right arrow over (v)}_(K)) which depends upon K different vectorial variables as follows. This polynomial will be homogeneous of degree 1 in each variable. For each term in H_(Z) of the form cZ_(i) ₁ , Z_(i) ₂ , . . . Z_(i) _(K) , where c is a scalar and i₁, i₂, . . . , i_(K) are a sequence of distinct choices of i, one has a corresponding term in H_(Z)({right arrow over (v)}₁, . . . , {right arrow over (v)}_(K)) equal to

${c\frac{1}{K!}{\sum\limits_{\pi}{\left( {\overset{\rightarrow}{\upsilon}}_{1} \right)_{i_{\pi {(1)}}}\left( {\overset{\rightarrow}{\upsilon}}_{2} \right)_{i_{\pi {(1)}}}{\ldots \left( {\overset{\rightarrow}{\upsilon}}_{K} \right)}_{i_{\pi {(1)}}}}}},$

where the sum is over permutations π on K elements and ({right arrow over (v)}_(a))_(b) denotes the b-th entry of vector {right arrow over (v)}_(a). For example, for K=2, given a term −Z₂Z₃ one has the corresponding term −(½)({right arrow over (v)}₁)₂({right arrow over (v)}₂)₃−(½)({right arrow over (v)}₂)₃({right arrow over (v)}₁)₂. Here in an abuse of notation, the same symbol H_(Z)(·) is used for two different functions, one depending on K vectorial arguments and one depending on a single vectorial argument.

Note that

H _(Z)({right arrow over (v)}, {right arrow over (v)}, . . . , {right arrow over (v)})=H _(Z)({right arrow over (v)}).   (13)

The purpose of the rounding results will be, given some choice of {right arrow over (v)}₁, . . . , {right arrow over (v)}_(K)) such that H_(Z)({right arrow over (v)}₁, . . . , {right arrow over (v)}_(K)) has a certain magnitude, one can find a choice of {right arrow over (v)} such that H_(Z)({right arrow over (v)}) obeys certain conditions on its magnitude.

This will then be used in the classical setting in the following simple way: one can pick some vector {right arrow over (w)}₂ at random and then choose {right arrow over (w)}₁ greedily to optimize H_(Z)({right arrow over (w)}₁, {right arrow over (w)}₂, {right arrow over (w)}₂, {right arrow over (w)}₂, . . . , {right arrow over (w)}₂). Here the variable {right arrow over (w)}₁ appears 1 time while the variable {right arrow over (w)}₂ appears K−1. This will give one the choice of K different vectorial variables (though one variable is repeated K−1 times) from which one will construct a solution with a single variable.

Case 3 of the following theorem will be useful for the classical algorithm. Case 2 follows from case 3 with ϵ=1 but has slightly tighter bounds. Case 1 is given for completeness. Thus, the reader may consider only case 3.

Theorem 1. Let P({right arrow over (v)}₁, {right arrow over (v)}₂, . . . , {right arrow over (v)}_(K)) be a polynomial in vectorial variables {right arrow over (v)}₁, . . . , {right arrow over (v)}_(K) which is homogeneous of degree 1 in each argument so that

$\begin{matrix} {{{P\left( {{\overset{\rightarrow}{\upsilon}}_{1},\ldots \;,{\overset{\rightarrow}{\upsilon}}_{K}} \right)} = {\sum\limits_{i_{1},\; \ldots \;,\; i_{K}}{a_{i_{1},\; \ldots \;,\; i_{K}}{\prod\limits_{a}\; \left( {\overset{\rightarrow}{\upsilon}}_{a} \right)_{i_{a}}}}}},} & (14) \end{matrix}$

where ({right arrow over (v)}_(a))_(i) denotes the i-th entry of vector {right arrow over (v)}_(a).

Assume that all vectors {right arrow over (v)}_(a) have the same number of entries, and assume that P is symmetric under permuting its arguments, e.g., that a_(i) _(1, . . . ,) _(i) _(K) is symmetric under permuting its arguments.

Then the following holds:

1. Suppose that there exist some vectors {right arrow over (w)}₁, . . . , {right arrow over (w)}_(K) such that P({right arrow over (w)}₁, . . . , {right arrow over (w)}_(K))=C and such that |{right arrow over (w)}_(a))_(i)|≤1 for all a, i. Then, there exits some vector {right arrow over (u)} with

|({right arrow over (u)}_(i))|≤1

for all i such that

$\begin{matrix} {{{P\left( {\overset{\rightarrow}{u},\overset{\rightarrow}{u},\ldots \;,\overset{\rightarrow}{u}} \right)}} \geq {\frac{K!}{K^{K}}{C.}}} & (15) \end{matrix}$

2. Suppose that there exist vectors {right arrow over (w)}₁, {right arrow over (w)}₂ such that P({right arrow over (w)}₁, {right arrow over (w)}₂, {right arrow over (w)}₂, {right arrow over (w)}₂, . . . , {right arrow over (w)}₂)=C and such that |({right arrow over (w)}_(a))_(i)|≤1 for all a, i. (That is, the variable {right arrow over (w)}₁ appears 1 time while the variable {right arrow over (w)}₂ appears K−1 times.

Then, there exists some vector {right arrow over (u)} with

|({right arrow over (u)}_(i))|≤|({right arrow over (w)}₁)_(i)|+|({right arrow over (w)}₂)|_(i)

-   -   for all i such that

|P({right arrow over (u)}, {right arrow over (u)}, . . . , {right arrow over (u)})|≥P({right arrow over (w)}₂, {right arrow over (w)}₂, . . . , {right arrow over (w)}₂)+C·Ω(1/K).   (16)

3. Suppose that there exist vectors {right arrow over (w)}₁, {right arrow over (w)}₂ such that P({right arrow over (w)}₁, {right arrow over (w)}₂, {right arrow over (w)}₂, {right arrow over (w)}₂, . . . , {right arrow over (w)}₂)=C. Then for any ϵ>0, at least one of the following two possibilities holds:

-   -   A there exists some vector {right arrow over (u)} with |({right         arrow over (u)}_(i))|≤|({right arrow over (w)}₁)_(i)|+|({right         arrow over (w)}₂)|_(i) for all i such that

|P({right arrow over (u)}, {right arrow over (u)}, . . . , {right arrow over (u)})|≥P({right arrow over (w)}₂, {right arrow over (w)}₂, . . . , {right arrow over (w)}₂)+ϵC·Ω(1)   (17)

-   -   or     -   B then exists some vector {right arrow over (u)} with |({right         arrow over (u)}_(i))|≤|({right arrow over (w)}₁)_(i)|+|({right         arrow over (w)}₂)|_(i) for all i such that

|P({right arrow over (u)}, {right arrow over (u)}, . . . , {right arrow over (u)})|≤P({right arrow over (w)}₂, {right arrow over (w)}₂, . . . , {right arrow over (w)}₂)−C·exp(−O(K))/ϵ.   (18)

Further, in all cases, one can find {right arrow over (u)} up to any desired nonzero error in a time linear in N, exponential in K, and at most polynomial in inverse error compared to magnitude of the terms in the polynomial.

Note that item 1 above allows all of the {right arrow over (w)}_(a) to be distinct. Items 2,3 consider the case of just two different {right arrow over (w)}_(a), with {right arrow over (w)}₂ repeated K−1 times in the argument of P(·). One can summarize item 2 as saying that one can obtain a solution whose absolute value is close to C, while item 3 can be summarized for small ϵ as saying that, compared to P({right arrow over (w)}₂, {right arrow over (w)}₂, . . . , {right arrow over (w)}₂), either one can improve by a small amount (this is the “pretty good”) or there is a solution which is much worse (this is the “very bad”). Note also that the bound on |({right arrow over (u)})_(i)| is different in item 2 compared to items 1,3.

One can now prove the theorem. Define a function {right arrow over (u)}(·), from

^(K) to vectors, by

$\begin{matrix} {{{\overset{\rightarrow}{u}\left( {x_{1},\ldots \;,x_{K}} \right)} = {\sum\limits_{a}{u_{a}{\overset{\rightarrow}{\upsilon}}_{a}}}},} & (19) \end{matrix}$

where x_(a){right arrow over (v)}_(a) denotes the vector with i-th entry equal to x_(a)({right arrow over (v)}_(a))_(i).

One can first prove item 1 of theorem 1. One needs:

Lemma 1. Let p(x₁, . . . , x_(K)) be a polynomial (not necessarily homogenous) of degree at most K in real variables x₁. . . , x_(K). Suppose that the coefficient of the term Π_(i)x_(i) in p(·) is equal to C. Then, for some choice of x₁, . . . , x_(K)∈{−1, +1}^(K) one has that |p(x₁, . . . , x_(K))|≥C.

Proof. It is claimed that

$\begin{matrix} {C = {\frac{1}{2^{K}}{\sum\limits_{x_{1},\; \ldots \;,\; {x_{K} \in {\{{{- 1},{+ 1}}\}}^{K}}}{\left( {\prod\limits_{i}\; x_{i}} \right) \cdot {{p\left( {x_{1},\ldots \;,x_{K}} \right)}.}}}}} & (20) \end{matrix}$

This holds because any term in p(·) proportional to Π_(i) x_(i) ^(d) ^(i) for some sequence of integers d_(i) will vanish in the weighted sum above unless all d_(i) are odd. However, since p(·) has degree d, the only such nonvanishing term is that with all d_(i)=1.

Hence, |C|≤max_(x) ₁ _(, . . . , x) _(K∈{−1, +1}) _(K) (|p(x₁, . . . , x_(K))|). □

To prove item 1, consider polynomial Q(x₁, . . . , q_(K))≡P({right arrow over (u)}(x₁, . . . , x_(K)), . . . , {right arrow over (u)}(x₁, . . . , x_(K))). The polynominal Q(·) is of degree K and the coefficient of Π_(i) x_(i) in Q(·) is equal to CK!. So, by lemma 1, there exists some choice of x₁. . . , x_(K) ∈{−1, 1}^(K) such that |Q(x₁, . . . , x_(K))|≥CK!. Set

$\overset{\rightarrow}{u} = {\frac{1}{K}\left. \overset{\rightarrow}{(}{x_{1},\ldots \;,x_{K}} \right)}$

so that indeed |({right arrow over (u)})_(i))|≤1 for all i.

Then, |P({right arrow over (u)}, . . . , {right arrow over (u)})|≥(1/K)^(K) CK!. This prove item 1 and trivially one can find the choice of {right arrow over (u)} by iterating over the 2^(K) possible choices of x₁, . . . ,x_(K) ∈{−1, 1}^(K).

Item 2 will next be proved. The following lemma applies:

Lemma 2. Let p(x) be a polynomial of degree K with p(x)=Σ_(0≤i≤d)a_(i)x^(i). Then, for K odd

min_(x∈[−1,1])(|(p(x)|)≥|a₁|/K,   (21)

and for K even

min_(x∈[−1,1])(|(p(x)|)≥|a₁|/(K−1),   (22)

Proof. The proof is similar to the proof that that the Chebyshev polynomials have minimum absolute value on the interval [−1, 1] among all polynomials with given leading coefficients, e.g., with given value of a_(K). In this case, one can instead fix the value of a₁, but the proof is almost the same.

First, without loss of generality, one can assume that p(x)=−p(−x), as (p(x)−p(−x))/2 is also a polynomial of degree K with coefficient of the linear term also equal to a₁ and |(p(x)−p(−x))/2|≤max(|p(x)|, |p(−x)|). So, one can assume that K is odd and the result for even K will follow immediately from the result for odd K.

Also, without loss of generality one may assume that a₁=1. Indeed, if a₁=0, then the result is trivial true, while for any nonzero a₁, one can instead consider p(x)/a₁.

Assume that the lemma is false, e.g., assume that p(x) has maximum absolute value on the interval [−1, 1] which is strictly smaller than 1/K.

Let T_(n)(x) be the Chebyshev polynomials of first kind. For odd K, −(−1)^(K)·T_(K)(x)/K is a polynomial of degree K which has coefficient of the linear term equal to 1. Further, −(−1)^(K)·T_(K)(x)/K has a maximum absolute value on the interval [−1, 1] equal to 1/K and it attains this maximum K+1 times on this interval at points x=cos(kπ/K) for 0≤k≤K. Let q(x)=p(x)+(−1)^(K)·T_(K)(x)/K. So, q(x) has coefficient of the linear term equal to zero, e.g., since it is an odd function of x, one has q(x)=Σ_(i=3,5, . . . ,K)b^(i)x^(i) for some coefficients b_(i) and further by the assumption that p(x) has absolute value strictly smaller than 1/K on the interval, one has that at points x=cos(kπ/K) the sign of q(x) is the same as the sign of (−1)^(K)·T_(K)(x)/K. So, since the sign of T_(K)(x) alternates at these points, e.g., the sign for even k is opposite to that for odd k, one has that q(x) changes sign at least K times so q(x) must have at least K−1 distinct zeros. However, q(x) has degree K and the root at x=0 is triply degenerate so in fact q(x) can only have at most K−2 distinct zeros, giving a contradiction. □

Define polynomial Q(x)≡P({right arrow over (u)}(x, 1, 1, . . . , 1), . . . . , {right arrow over (u)}(x, 1, 1, . . . , 1)), e.g., in the argument of {right arrow over (u)}, 1 is repeated a total of K−1 times. Applying lemma 2 to p(x)=Q(x), the result follows. One can find an x which maximizes |Q(x)| up to any given error by exhaustively trying a discrete set of points on the interval [−1, 1] with the spacing between points dependent on the error.

Finally, item 3 is proven. The following lemma is applied:

Lemma 3. Let p(x)=Σ_(0≤i≤d)a_(i)x^(i) be a degree-d polynomial in real variable x. Let p_(max)=max_(x∈[−1,1])p(x). Let a_(max)=max_(i≥1)|a_(i)|. Then

p_(max)≥a₀+(⅙)a₁ ²/a_(max)   (23)

Remark: the factor ⅙ in the above equation is not optimal. It can be tightened easily. Indeed, for a₁<<a_(max), the factor ⅙ approaches ½.

Proof Consider p(x₀) for x₀=a₁/4a_(max)). One has p(x₀)=a₀+(¼)a_(max)+Σ_(1≤i≤d)a_(i)x_(i) ^(i). So,

$\begin{matrix} {{p\left( x_{0} \right)} \geq {a_{0} + {\left( {1/4} \right){a_{1}^{2}/a_{\max}}} - {{{{\sum\limits_{{2i} \leq i \leq \infty}{a_{\max}\left( {{a_{1}}/\left( {4a_{\max}} \right)} \right)}^{i}} \geq {a_{0} + {\left( {1/4} \right){a_{1}^{2}/a_{\max}}} -}}}{\sum\limits_{{2i} \leq i \leq \infty}{{a_{\max}\left( {{a_{1}}/\left( {4a_{\max}} \right)} \right)}^{2}\left( {{a_{i}}/\left( {4a_{\max}} \right)} \right)^{i\text{-}2}}}}} \geq {a_{0} + {\left( {1/4} \right){a_{1}^{2}/a_{\max}}} - {{a_{\max}\left( {{a_{1}}/\left( {4a_{\max}} \right)} \right)}^{2}{\sum\limits_{{2i} \leq i \leq \infty}\left( {1/4} \right)^{i\text{-}2}}}} \geq {a_{0} + {\left( {1/6} \right){a_{1}^{2}/{a_{\max}.}}}}} & (24) \\ \; & \bullet \end{matrix}$

Define polynomial Q(x)≡P({right arrow over (u)}(x, 1, 1, . . . , 1), . . . , {right arrow over (u)}(x, 1, 1, . . . , 1)) as above. Apply lemma 3 with a₁=C. If item A of theorem 1 does not hold for some given ϵ, then (⅙)C²/a_(max)<ϵC so a_(max)>(⅙)(C/ϵ). So for some i≥1, |a_(i)|>(⅙)(C/ϵ). So,

${{a_{i}} = {{\begin{pmatrix} K \\ i \end{pmatrix}{{P\left( {{\overset{\rightarrow}{w}}_{1},\ldots \;,{\overset{\rightarrow}{w}}_{1},{\overset{\rightarrow}{w}}_{2},\ldots \;,{\overset{\rightarrow}{w}}_{2}} \right)}}} > {\left( {1/6} \right)\left( {C/\epsilon} \right)}}},$

where {right arrow over (w)}₁ appears i times in the argument of P(·) and {right arrow over (w)}₂ appears K−i times. So, by item 1 of theorem 1, there is some choice of {right arrow over (u)} with |({right arrow over (u)}_(i))|≤1 for all i such that

${{P\left( {\overset{\rightarrow}{u},\overset{\rightarrow}{u},\ldots \;,\overset{\rightarrow}{u}} \right)}} \geq {\frac{1}{\begin{pmatrix} K \\ i \end{pmatrix}}\frac{K!}{K^{K}}\left( {1/6} \right){\left( {C/\epsilon} \right).}}$

Since

${{\frac{1}{\begin{pmatrix} K \\ i \end{pmatrix}}\frac{K!}{K^{K}}\left( {1/6} \right)} \geq {\exp \left( {- {O(K)}} \right)}},$

the result follows.

This completes the proof.

V. Classical Algorithm

The classical optimization algorithm will now be described.

Some notation that will be useful both here and in the analysis of the quantum algorithm will now be introduced.

Let one define F_(i) (the symbol “F” is for “force”, e.g., a derivative of energy with respect to some coordinate) to equal Z_(i) times the sum of terms in H_(Z) that include Z_(i). For example, for K=4 and H_(Z)=Z₁Z₂Z₃Z₄+Z₁Z₃Z₄Z₅ then F₁=Z₂Z₂Z₄+Z₃Z₄Z₅. The “force” depends upon the choice of Z_(i) so one will sometimes write F_(i)({right arrow over (Z)}) to indicate its dependence on {right arrow over (Z)}.

A. Some Probability Bounds

In this section, some probability bounds are collected that will be used to analyze this algorithm, as well as to analyze the classical algorithm.

Algorithm 1 Classical algorithm 1. Fix some real number 0 < p < 1. Choose a set S of degrees of freedom, by including each degree of freedom in S independently with probability p. 2. Define vectorial variables {right arrow over (w)}₁, {right arrow over (w)}₂ as follows; the index of the vectorial variable will correspond to degrees of freedom. Let {right arrow over (w)}₂ be a vector with ({right arrow over (w)}₂)_(i) = 0 for i ∈ S while for i ∉ S one can choose ({right arrow over (w)}₂)_(i) to +1 or −1 independently and uniformly at random. One can choose vector {right arrow over (w)}₁ so that ({right arrow over (w)}₁)_(i) = 0 for i ∈ S while for i ∈ Sone can choose ({right arrow over (w)}₁)_(i) “greedily”. That is, ({right arrow over (w)}₁), = +1 is picked if F₁ ({right arrow over (w)}₂) > 0 and ({right arrow over (w)}₂)_(i) = −1 otherwise. 3. Finally, apply item 3 of theorem 1. By this item, for any ∈ > 0, one can either find a choice of {right arrow over ({right arrow over (u)})} such that H_(Z)({right arrow over (u)}) ≥ H_(Z)({right arrow over (w)}₂ ) + ∈C · Ω(1) or such that H_(Z)({right arrow over (u)}) ≥ H_(Z)({right arrow over (w)}₂) − C · Ω(1)/∈, where C = Σi∈s |F_(i)|.

By theorem 9.23 of R. O'Donnell, Analysis of boolean functions (Cambridge University Press, 2014), for any function ƒ of degree at most K from {−1, 1}^(N)→

one has for any t≥(2e)^(K/2) that

$\begin{matrix} {{\Pr_{x \in {\{{{- 1},1}\}}^{N}}\left\lbrack {{{f(x)}} \geq {t\; {\left\lbrack {f}^{2} \right\rbrack}^{1/2}}} \right\rbrack} \leq {{\exp \left( {{- \frac{K}{2e}}t^{2/K}} \right)}.}} & (25) \end{matrix}$

By theorem 9.24 of R. O'Donnell, Analysis of boolean functions (Cambridge University Press, 2014), for any nonconstant function ƒ of degree at most K from {−1, 1}^(N)→

,

Pr_(x∈{−1,1}) _(N) [ƒ(x)>

[|ƒ|]]≥¼exp−2K.   (26)

Hence, for any nonconstant function ƒ of degree at most K from {−1, 1}^(N)→

, by applying Eq. (26) to ƒ², one has

Pr_(x∈{−1,1}) _(N) [|ƒ(x)|>

[|ƒ|²]^(1/2)]≥1/4exp−4K. (27)

Applying these bounds to the force F_(i), one finds that the average of |F_(i)| is at least √{square root over (D)} exp(−O(K)). At the same time, the expectation value of H_(Z)({right arrow over (w)}₂) is equal to zero.

B. Analysis of Classical Algorithm

Since |F_(i)|≥√{square root over (D)}exp(−O(K)), one finds that the constant C in the algorithm has average value at least N√{square root over (D)} exp(−O(K)).

The algorithm chooses either case 3A or case 3B at least half the time (or any other number Ω(1) rather than one half) and in that case, the average of |F_(i)| must still be at least √{square root over (D)}exp(−O(K)). Hence, at least one of the following holds: when the algorithm chooses case 3A at least half the time and has expected H_(Z)({right arrow over (u)}) at least Nϵ√{square root over (D)} exp(−O(K)) or the algorithm chooses case 3B at least half the time and has expected H_(Z)({right arrow over (u)}) at most −N √{square root over (D)} exp(−O(K))/ϵ.

Note that for odd K, one can guarantee then expected H_(Z)({right arrow over (U)})≥√{square root over (D)} exp(−O(K)) as in B. Barak, A. Moitra, R. O'Donnell, P. Raghavendra, O. Regev, D. Steurer, L. Trevisan, A. Vijayaraghavan, D. Witmer, and J. Wright, arXiv preprint arXiv:1505.03424 (2015), since one can pick ϵ=1 and if case 3B occurs, one can change the sign of all variables.

Note also that while the guarantee on the classical algorithm involves the expected value of H_(Z), since the expected value is within 1/poly(D) of the optimal value (which is O(ND)), by repeating the algorithm poly(D) times one can, with probability at least one-half, obtain a solution within a constant factor of the expected value.

VI. Analysis of Quantum Algorithm

An embodiment of the quantum quench algorithm disclosed herein will now be analyzed in more detail. From Eq. (8),

${\langle{\tau_{T}^{H}\left( H_{Z} \right)}\rangle} = {D{\frac{N - {\langle{\tau_{T}^{H}(X)}\rangle}_{+}}{\alpha}.}}$

One can estimate

τ_(T) ^(H)(X))₊,

Consider site i. One can estimate

τ_(T) ^(H)(X_(i)

₊. Summing over i will give

τ_(T) ^(H)(X)

₊.

The basic physical idea is that if one can ignore the time dependence of the force F_(i), then one can approximate

τ_(T) ^(H)(X_(i))

₊ by the expectation value of X_(i) assuming that the spin i evolves for a time T under a time-independent Hamiltonian. This time-independent Hamiltonian a transverse field of strength 1 (e.g., the term X_(i) in the Hamiltonian) and to a parallel field (α/D)F_(i), where F_(i) is the force assuming that all other spins Z_(j) for j≠i are drawn from a uniformly random distribution (because at time T=0, the state of the system is ψ₊ which has equal amplitude on all states). In this case, similar to the analysis of the classical algorithm before, the force F_(i) is likely to be at least of order √{square root over (D)} in which case one will have 1−

τ_(T) ^(H)(X_(i))

₊˜(α/D)²

F_(i) ²

+T²˜α²T²/D.

However, one cannot always neglect the time dependence of the force. To estimate whether or not the time dependence of the force is significant, one can compare the time-derivative of the force to √{square root over (D)}/T. If the time-derivative of the force is small enough compared to √{square root over (D)}/T, then the approximation of the above paragraph will be valid. On the other hand, if the time-derivative is not so small, one can derive a similar duality to the classical case. Indeed, one will also have some stronger results here in the case that the final state has large expectation value of X.

In subsection VIA, the time-independent case is anayzed. Subsection VIC describes a toy example where one can see the effects of time-dependence. In subsection VID, consideration is given to the errors by ignoring the time-dependence and the main results are obtained.

A. Time-Independent Force

Let one first analyze the time-independent force approximation in more detail before considering the time-dependence. One wishes to compute

$\langle{{\exp \left( {{- {i\left( {{\frac{\alpha}{D}Z_{i}F_{i}} + X_{i}} \right)}}T} \right)}\psi_{+}{X_{i}}{\exp \left( {{- {i\left( {{\frac{\alpha}{D}Z_{i}F_{i}} + X_{i}} \right)}}T} \right)}{\psi_{+}.}}$

That is, consideration is given to an evolution under a Hamiltonian which includes the coupling (α/D)Z_(i)F_(i) and the transverse field X_(i), but ignoring any other coupling terms which would give the remaining qubits a time-dependence in the Z-basis.

As in the analysis of the classical case, the probability that |F_(i)|≥√{square root over (D)} is at least ¼ exp−4K. At the same time, by Eq. (25), the probability that |F_(i)|≥t√{square root over (D)} for t≥(2e)^(K/2) is at most

${\exp \left( {{- \frac{K}{2e}}t^{2/K}} \right)}.$

Picking t sufficiently large (for example, t=C^(K/2) for sufficiently large, K-independent constant C suffices), this probability is much smaller than ⅛ exp(−4K). So, with probability at least ⅛ exp(−4K), one has |F_(i)|∈[√{square root over (D)}, C^(K/2) √{square root over (D)}].

Then, for C^(K/2) √{square root over (D)}T sufficiently small compared to 1, for any |F_(i)| in that interval, one has

$\begin{matrix} {\langle{{{\exp \left( {{- {i\left( {{\frac{\alpha}{D}Z_{i}F_{i}} + X_{i}} \right)}}T} \right)}\psi_{+}{X_{i}}{\exp \left( {{- {i\left( {{\frac{\alpha}{D}Z_{i}F_{i}} + X_{i}} \right)}}T} \right)}\psi_{+}} \leq {1 - {\frac{\alpha^{2}T^{2}}{D}{{\exp \left( {- {O(K)}} \right)}.}}}}} & (28) \end{matrix}$

Thus, if this time-independent approximation is valid (and valid for all i), one has that

τ_(T)(H_(Z))

₊≥αT² exp(−O(K))N.   (29)

Remark: here, an upper bound on force F_(i) was used because of the fixed time. If one averages over times on an interval, such an upper bound is not necessary.

B. Second Derivative of Expectation Value of X

One can give another approximate analysis by considering

$\begin{matrix} {{{\partial_{T}^{2}{\langle{\tau_{T}^{H}\left( X_{i} \right)}\rangle}} = {{{- \frac{\alpha^{2}}{D}}{\langle{\tau_{T}^{H}\left( {X_{i}F_{i}^{2}} \right)}\rangle}_{+}} + {\frac{\alpha}{D}{\langle{\tau_{T}^{H}\left( {Y_{i}{\overset{.}{F}}_{i}} \right)}\rangle}_{+}}}},} & (30) \end{matrix}$

where for any operator O, define {dot over (O)}=−i[O,H].

For T=0 the first term is equal to

$- {\frac{\alpha^{2}}{D}.}$

Assuming (which assumption is considered in more detail below) that the first term remains

${{- {\Omega (1)}}\frac{\alpha^{2}}{D}},$

then one has

${\langle{\tau_{T}^{H}\left( X_{i} \right)}\rangle} = {1 - \frac{\alpha^{2}T^{2}}{D}}$

unless the second term

$\frac{\alpha}{D}{\langle{\tau_{T}^{H}\left( {Y_{i}{\overset{.}{F}}_{i}} \right)}\rangle}_{+}$

also becomes

$\Omega (1){\frac{\alpha^{2}}{D}.}$

For this to happen, one needs

τ_(T) ^(H)(Y_(i){dot over (F)}_(i))

₊=Ω(1)α.

Thus, under the assumption about the first term, one has one of two situations. Either, after time T, one has

${\langle{\tau_{T}^{H}(X)}\rangle} = {N \cdot \left( {1 - {{\Omega (1)}\frac{\alpha^{2}T^{2}}{D}}} \right)}$

so that

${\langle{\tau_{T}^{H}\left( H_{Z} \right)}\rangle} = {{\Omega (1)}\frac{\alpha^{2}T^{2}}{D}}$

or one has Σ_(i)

τs^(H)(Y_(i){dot over (F)}_(i))

_(+=Ω()1)αN for some time s≤T. Further, at that time s, if one does not have

$\langle{{\tau_{T}^{H}(X)} \geq {N \cdot \left( {1 - {{O(1)}\frac{\alpha^{2}T^{2}}{D}}} \right)}}$

then one has

${\langle{\tau_{T}^{H}\left( H_{Z} \right)}\rangle} = {{\Omega (1)}{\frac{\alpha \; T^{2}}{D}.}}$

So, either the algorithm finds a state (by sampling over times s≤T) with expectation value of H_(Z) equal to

${\Omega (1)}\frac{\alpha \; T^{2}}{D}$

or for some state with expectation value of X at least

$N \cdot \left( {1 - {{O(1)}\frac{\alpha^{2}T^{2}}{D}}} \right)$

and expectation value of Σ_(i)Y_(i){dot over (F)}_(i) at least Ω(1)α. Choosing α²T²˜D, this yields the same guarantees as the classical algorithm for α=√{square root over (D)}/ϵ, since using the same rounding as in the classical case, a large expectation value of Σ_(i)Y_(i){dot over (F)}_(i) implies one can construct a solution with a large absolute value of expectation value of H_(Z).

For smaller α²T², one will see that additional guarantees can be given.

C. Toy Example

The Hamiltonian of Eq. (3) provides an interesting example to study the time-dependence of the force. Defining Z=Σ_(i)Z_(i), for the Hamiltonian (3) one has H_(Z)=−½Z²+const. (This constant is negative and of order N.) Hence, up to an additive constant, one has

${H = {X = {{\frac{\alpha}{2D}Z^{2}} \approx {X - {\frac{\alpha}{2N}Z^{2}}}}}},$

since D=N−1. This system can be approximately treated as a harmonic oscillator, at least for X close to N. One can work in all eigenbasis of Z, letting state |z

denote an eigenstate of Z with eigenvalue z. In the large X regime, the wavefunction has most of its probability on basis states with i close to zero where the X operator is approximately equal to (N/2)|z

z+1|+h.c.. One can approximate further by treating z as a continuous variable, approximating (N/2)|z

z+1|+h.c. by N+(N/2)∂_(z) ², valid in the long wavelength regime. One then gets that the Hamiltonian is equal to (ignoring additive constants) approximately equal to

$\frac{N}{2}{\partial_{z}^{2}{- \frac{\alpha}{2N}}}z^{2}$

Other than having a minus sign in front, this Hamiltonian is the familiar Hamiltonian for a harmonic oscillator. The oscillator has angular frequency

ω=√{square root over (α)}.   (31)

The z variable oscillates periodically with time at the given frequency. The force F_(i) at time t is (in this continuum approximation) equal to z(t). Hence, if αT²≳1, then the time-dependence of the force cannot be neglected in this example. Note that here again, one can see this product αT² appearing; in the time-independent analysis above (and in the previous heuristic analysis), this product controls the expectation value of H_(Z). Thus, it is no surprise that for this toy example the time-independent approximation breaks down since there is no way to make the expectation value of H_(Z) be large compared to N for this instance.

D. Time-Dependent Force Define

Δ_(s) =F _(i)−τ_(s) ^(H)(F _(i)).   (32)

One has (this hold for any state on the right-hand side, if one replaces ψ₊ with an arbitrary state in the next two equations):

$\begin{matrix} {{{{\exp \left( {- {iHT}} \right)}{\exp \left( {{- i}\frac{\alpha}{D}{\int_{0}^{T}{Z_{i}\Delta_{s}{ds}}}} \right)}\psi_{+}} = {{{\exp \left( {- {iHT}} \right)}\psi_{+}} + {{\exp \left( {- {iHT}} \right)}\xi}}},} & (33) \end{matrix}$

where the exponential is an s-ordered exponential (e.g., it is time-ordered with respected to s, are are later exponent of integrals below)

$\begin{matrix} {\xi = {{- i}\frac{\alpha}{D}{\int_{0}^{T}{{ds}\mspace{14mu} {\exp \left( {i - {\frac{\alpha}{D}{\int_{s}^{T}{\Delta_{u}{du}}}}} \right)}Z_{i}\Delta_{s}{\psi_{+}.}}}}} & (34) \end{matrix}$

So, since

$\exp \left( {i - {\frac{\alpha}{D}{\int_{s}^{T}{\Delta_{u}{du}}}}} \right)$

is unitary, by a triangle inequality, on has

$\begin{matrix} {{\xi } \leq {\frac{\alpha}{D}{\int_{s}^{T}{{ds}{\sqrt{{\langle\Delta_{s}^{2}\rangle}_{+}}.}}}}} & (35) \end{matrix}$

Define

$\begin{matrix} {{\varphi (T)} = {{\exp \left( {- {iHT}} \right)}{\exp \left( {{- i}\frac{\alpha}{D}{\int_{0}^{T}{Z_{i}\Delta_{s}{ds}}}} \right)}{\psi_{+}.}}} & (36) \end{matrix}$

This definition of ϕ(T) has the following property as can be seen by going to the interaction representation. Define operator R by

H=X _(i) +Z _(i) F _(i) =R,   (37)

so that R includes all terms in H which are not supported on site i. Then,

${\varphi (T)} = {{\exp \left( {- {iRT}} \right)}{\exp \left( {{- {i\left( {{\frac{\alpha}{D}Z_{i}F_{i}} + X_{i}} \right)}}T} \right)}{\psi_{+}.}}$

Hence,

$\begin{matrix} {{\langle{{\varphi (T)}{X_{i}}{\varphi (T)}}\rangle} = {\langle{{\exp \left( {{- {i\left( {{\frac{\alpha}{D}Z_{i}F_{i}} + X_{i}} \right)}}T} \right)}\psi_{+}{X_{i}}{\exp \left( {{- {i\left( {{\frac{\alpha}{D}Z_{i}F_{i}} + X_{i}} \right)}}T} \right)}{\psi_{+}.}}}} & (38) \end{matrix}$

Define

ψ₊(T)=exp(−iTH)ψ₊.   (39)

So,

ϕ(T)=ψ₊(T)+ξ.  (40)

Hence,

ϕ(T)|X _(i)|ϕ(T)

=

τ_(T)(X _(i))

₊+2Re

ψ₊(T)|X _(i)|ξ

ξ|X _(i)|ξ

.  (41)

Let II_(i) ⁻=(1−X_(i))/2, so that it projects onto the |−

state on qubit i. So,

ϕ(T)|II _(i) ⁻ϕ(T)

=

τ_(T)(II _(i) ⁻)

₊+2Re

ψ₊(T)|II _(i) ⁻|ξ

+

ξ|II _(i) ⁻|ξ

.   (42)

By Cauchy-Schwarz, the second term in the above equation is bounded by 2√{square root over (

τ_(T) ^(H)(II_(i) ⁻)

₊)}|ξ|. The third term is bounded by |ξ|².

Hence,

ϕ(T)|II _(i) ⁻|ϕ(T)

≤

τ_(T)(II _(i) ⁻)

₊+2√{square root over (

τ_(T) ^(H)(II _(i) ⁻)

₊)}|ξ|+|ξ|².   (43)

So,

τ_(T) ^(H)(II_(i) ⁻)

₊≥

ϕ(T)|II_(i) ⁻|ϕ(T)

−2√{square root over (

τ_(T) ^(H)(II_(i) ⁻)

₊)}|ξ|−|ξ|².   (44)

Thus,

τ_(T) ^(H)(II _(i) ⁻)

₊≥

ϕ(T)|II _(i) ⁻|ϕ(T)

−2√{square root over (

ϕ(T)|II _(i) ⁻|ϕ(T)

)}|ξ|−|ξ|².   (45)

So, if one can bound |ξ| sufficiently small compared to √{square root over (

ϕ(T)|II_(i) ⁻|ϕ(T)

)}, then one can lower bound

τ_(T) ^(H)(II_(i) ⁻)

₊ compared to

ϕ(T)|II_(i) ⁻|ϕ(T)

. For example, if one can bound that |ξ|≤√{square root over (

ϕ(T)|II_(i) ⁻|ϕ(T)

)}/3, then

τ_(T) ^(H)(II_(i) ⁻)

₊≥

ϕ(T)|II_(i) ⁻|ϕ(T)

·(1−2/3−1/9)=(2/9)·

ϕ(T)|II_(i) ⁻|ϕ(T)

. If one can give even tighter bounds on then |ξ|, then

τ_(T) ^(H)(II_(i) ⁻)

₊→

ϕ(T)|II_(i) ⁻|ϕ(T)

as |ξ|→0.

So, one can now bound |ξ|². From Eq. (35),

${\xi } \leq {\frac{\alpha}{D}{\int_{0}^{T}{{ds}\mspace{14mu} {\sqrt{{\langle\Delta_{s}^{2}\rangle}_{+}}.}}}}$

So, one turns to bounding

Δ_(s) ²

₊. So, Δ_(s)ψ₊=−∫₀ ^(s) dvτ_(v) ^(H) ({dot over (F)}_(i)), since Δ₀=0. So, again by Cauchy-Schwarz

Δ_(s) ²

+≤s ∫₀ ^(s) dv

|τ_(v) ^(H)({dot over (F)}_(i))|²

₊.   (46)

So,

$\begin{matrix} {{\xi } \leq {\frac{\alpha}{D}{\int_{0}^{T}{{ds}\mspace{14mu} s{\sqrt{\frac{\int_{0}^{s}{d\; \upsilon {\langle{{\tau_{\upsilon}^{H}\left( {\overset{.}{F}}_{i} \right)}}^{2}\rangle}_{+}}}{s}}.}}}}} & (47) \end{matrix}$

So, |ξ| is bounded by αT²/(2D) times the expectation value of √{square root over (

|τ_(v) ^(H)({dot over (F)}_(i))|²

₊)} for s randomly chosen in the interval [0,T] from measure (T²/2)⁻ sds and v uniformly randomly chosen in the interval [0, s] This random choice of s followed by a random choice of v induces a measure

dμ(v)=2(1−v)dv.   (48)

Thus, to have

τ_(T) ^(H)(II_(i) ⁻)

₊/

ϕ(T)|II_(i) ⁻|ϕ(T)

small compared to 1, given that

ϕ(T)|II_(i) ⁻|ϕ(T)

is at least exp(−O(K))(α²T²)/D as shown in subsection VIA, then one needs that for random choice of v from the measure μ(v), that the expectation value

_(v)[√{square root over (

τ_(v) ^(H)({dot over (F)}_(i))|²

₊])} is at least

$\begin{matrix} {{{\exp \left( {- {O(K)}} \right)}\sqrt{\frac{\alpha^{2}T^{2}}{D}}\frac{2D}{\alpha \; T^{2}}} = {{\exp \left( {- {O(K)}} \right)}\frac{\sqrt{D}T}{\cdot}}} & (49) \end{matrix}$

Intuitively, Eq. (49) is clear: the magnitude of {dot over (F)}_(i), e.g., “how quickly the force is changing in time”, must be comparable to the force at time 0 (e.g., to √{square root over (D)}) divided by the time T, in order for the force to be small at time T.

Hence, the following lemma can be developed:

Lemma 4. For T≤exp(−O(K))/√{square root over (D)}, at least one of the following two possibilities holds: 1.

τ_(T) ^(H)(H_(Z))

₊≥αT² exp(−O(K))N.   (50)

2. Σ_(i)

_(v)[√{square root over (

|τ_(v) ^(H)({dot over (F)}_(i))|²

₊])} is at least

${\exp \left( {- {O(K)}} \right)}{\frac{\sqrt{D}}{T}.}$

Proof. We have shown above that for each site i, if

τ_(T) ^(H)(II_(i) ⁻)

₊/

ϕ(T)|II_(i) ⁻|ϕ(T)) is compared to 1, then for random choice of v from the measure μ(v), that the expectation value

_(v)[√{square root over (

|τ_(v) ^(H)({dot over (F)}_(i))|²

₊])} is at least

${\exp \left( {- {O(K)}} \right)}{\frac{\sqrt{D}}{T}.}$

If

τ_(T) ^(H)(II_(i) ⁻)

₊/

ϕ(T)|II_(i) ⁻|ϕ(T)

is compared to 1 for at least half the site i, then item 2. holds while if it is small compared to 1 for fewer than half the sites then item 1. holds. □

At the time v of item 2 of lemma 4, at least one of the following holds:

⟨τ_(υ)(H_(Z))⟩ ≥ α T²exp (−O(K))N  or ${\langle{\tau_{\upsilon}(X)}\rangle} \geq {\left( {1 - {\frac{\alpha^{2}T^{2}}{D}{\exp \left( {- {O(K)}} \right)}}} \right) \cdot {N.}}$

Hence, considering the mixed state averaged over v, these results hold in expectation.

Hence

Theorem 2. For T≤exp(−O(K))/√{square root over (D)}, at least one of the following two possibilities holds: for some s ∈[0,T] one has that

1. Using a quench, one can produce a quantum state in polynomial times with expectation value of H_(Z)≥αT²exp(−O(K))N.

2. For some state ψ with

${\langle{\psi {X}\psi}\rangle} \geq {\left( {1 - {\frac{\alpha^{2}T^{2}}{D}{\exp \left( {- {O(K)}} \right)}}} \right).}$

N, one has that

${\sum\limits_{i}\sqrt{\langle{\psi {{{\overset{.}{F}}_{i}}^{2}}\psi}\rangle}} \geq {{\exp \left( {- {O(K)}} \right)}\frac{\sqrt{D}T}{\cdot}}$

Proof. If item 1. of lemma 4 holds, then item 1. of this theorem hold by choosing s=T. If item 2. of lemma 4 holds, and

${E_{\upsilon}\left\lbrack {{\langle{\tau_{\upsilon}(X)}\rangle} < {\left( {1 - {\frac{\alpha^{2}T^{2}}{D}{\exp \left( {- {O(K)}} \right)}}} \right) \cdot N}} \right\rbrack},$

then E_(v)[

τ_(v)(H_(Z))

]≥αT²exp(−O(K))N and so by choosing a random v, item 1. of this theorem holds by averaging over time v. If item 2. of lemma 4 holds and

${E_{\upsilon}\left\lbrack {{\langle{\tau_{\upsilon}(X)}\rangle} \geq {\left( {1 - {\frac{\alpha^{2}T^{2}}{D}{\exp \left( {- {O(K)}} \right)}}} \right) \cdot N}} \right\rbrack},$

then the mixed state averaged over time t obeys the conditions of item 2. and so some pure state will obey the conditions also. □

This state ψ of item 2 has the property that

${\sum\limits_{i}\sqrt{\langle{\psi {{{\overset{.}{F}}_{i}}^{2}}\psi}\rangle}} \geq {{\exp \left( {- {O(K)}} \right)}{\frac{\sqrt{D}}{T}.}}$

One can use this state ψ to construct another state which has a large expectation value for the absolute value of H_(Z); similar to the classical rounding, this large expectation value may be very negative rather than very positive. An additional ingredient, discussed in subsection VIE is the interesting regime with a large expectation value of X where additional bounds can be derived.

To do this, one can pick a random set of sites, called S. One can include each site in this set with probability ½, choosing independently for each site whether or not it is in S. Note that {dot over (F)}_(i) is a degree-(K−1) polynomial. Each term in the polynomial is degree K−2 in Pauli Z-variable and degree 1 in Pauli Y variables. Define F_(i) ^(S) to include only the terms in the polynomial which do not include sites in S. The basic idea (we will give this in more detail below) is to choose Y_(i) for i ∈S to be proportional to F_(i) ^(S) ; with the constant of proportionality chosen to have

${\langle X_{i}\rangle} \geq {1 - {\frac{\alpha^{2}T^{2}}{2D}{\exp \left( {- {O(K)}} \right)}}}$

for i ∈S.

In expectation over choices of S, one has

${\sum_{i \in S}\sqrt{\langle{\psi {{{\overset{.}{F}}_{i}^{\overset{\_}{S}}}^{2}}\psi}\rangle}} \geq {{\exp \left( {- {O(K)}} \right)}{\frac{\sqrt{D}}{T}.}}$

So, indeed for some choice of S this holds. Let ρ be the mixed state obtained by tracing out sites in S for such a choice. Let σ_(S) ⁺ be the density matrix on S with all spins polarized in the + direction. Consider the state

${\tau \equiv {{\exp \left( {{ic}{\sum\limits_{i \in S}{{\overset{.}{F}}_{i}^{\overset{\_}{S}}Z_{i}}}} \right)}\left( {\rho \otimes \sigma_{S}^{+}} \right){\exp \left( {{- {ic}}{\sum\limits_{i \in S}{{\overset{.}{F}}_{i}^{\overset{\_}{S}}Z_{i}}}} \right)}}},$

where c is a constant chosen below.

So long as c{dot over (F)}_(i) ^(S) =O(1), one can find that tr(τX_(i))≥1−O(c²)

(ρ|F_(i) ^(S) |²) and that tr(τΣ_(i∈S)≥ctr(ρ|F_(i) ^(S) |²). Suppose for the moment that

${\sum_{i \in S}\sqrt{\langle{\psi {{{\overset{.}{F}}_{i}^{\overset{\_}{S}}}^{2}}\psi}\rangle}} = {{\exp \left( {- {O(K)}} \right)}{\frac{\sqrt{D}}{T}.}}$

That is, a lower bound on the left-hand side was previously assumed, but now one can assume equality. Then, for c=(αT²)/D, one has

$\begin{matrix} {{{tr}\left( {\tau {\sum\limits_{i \in }{Y_{i}{\overset{.}{F}}_{i}^{\overset{\_}{}}}}} \right)} \geq {\alpha \; {{\exp \left( {- {O(K)}} \right)}.}}} & (51) \end{matrix}$

Also,

$\begin{matrix} {{{tr}\left( {\tau \; X_{i}} \right)} \geq {1 - {{O\left( \frac{\alpha^{2}T^{2}}{D} \right)}.}}} & (52) \end{matrix}$

Now, if it turns out that instead that Σ_(i∈S)√{square root over (

ψ||{dot over (F)}_(i) ^(S) |²|ψ

)} is significantly larger than

${{\exp \left( {{- O}(K)} \right)}\frac{\sqrt{D}}{T}},$

one can instead reduce c proportionally and still obtain a state obeying Eqs. (51,52).

So, one finds that

Theorem 3. For T≤exp(−O(K))/√{square root over (D)}, at least one of the following two possibilities holds: for some s ∈[0,T] one has that

1. Using a quench one can produce a quantum state in polynomial times with expectation value of H_(Z)≥αT²exp((−O(K))N.

2. For some state ψ with

${\langle{\psi {X}\psi}\rangle} \geq {\left( {1 - {\frac{\alpha^{2}T^{2}}{D}{\exp \left( {- {O(K)}} \right)}}} \right).}$

N, one has that

${{\sum\limits_{i \in }{\langle{\psi {{X_{i}{\overset{.}{F}}_{i}}}\psi}\rangle}}} \geq {{\exp \left( {{- O}(K)} \right)}\alpha \; {N.}}$

For the case α²T²/D=1, applying the same rounding as in the classical case to item 2, one finds the same duality as in the classical case, for αT²=ϵ√{square root over (D)} and α=√{square root over (D)}/ϵ.

In the next subsection, additional guarantees present in the quantum algorithm when α²T²/D<<1 are explored.

Further, one can consider the expectation value of higher moments of τ_(v)(X). The reason for considering this is explained later. The time evolution conserves the quantity

${H + X + {\frac{\alpha}{D}H_{Z}}},$

but it also conserves all moments of this quantity. Note that in the state ψ₊ one has

(H−N)²

₊=(α²/D²)

H_(Z) ²

₊=(α²/D²)N_(T)=Nα²/(DK). Hence,

τ_(T) ^(H)(H−N²)

₊=Nα²/(DK). By Cauchy-Schwarz,

$\begin{matrix} {{\langle{\tau_{T}^{H}\left( \left( {H - N} \right)^{2} \right)}\rangle}_{+} = {{{\langle{\tau_{T}^{H}\left( \left( {X - N} \right)^{2} \right)}\rangle}_{+} + {2\frac{\alpha}{D}{\langle{\tau_{T}^{H}\left( {\left( {X - N} \right)H_{Z}} \right)}\rangle}_{+}\frac{\alpha^{2}}{D^{2}}{\langle{\tau_{T}^{H}\left( H_{Z}^{2} \right)}\rangle}_{+}}} \geq {{\langle{\tau_{T}^{H}\left( \left( {X - N} \right)^{2} \right)}\rangle}_{+} + {\langle{\tau_{T}^{H}\left( H_{Z}^{2} \right)}\rangle}_{+} - {2{\sqrt{{\langle{\tau_{T}^{H}\left( H_{Z}^{2} \right)}\rangle}_{+}{\langle{\tau_{T}^{H}\left( \left( {X - N} \right)^{2} \right)}\rangle}_{+}}.}}}}} & (53) \end{matrix}$

Hence,

$\begin{matrix} {\sqrt{{\langle{\tau_{T}^{H}\left( H_{Z}^{2} \right)}\rangle}_{+}} \geq {{\frac{D}{\alpha}\sqrt{{\langle{\tau_{T}^{H}\left( \left( {X - N} \right)^{2} \right)}\rangle}_{+}}} + {\sqrt{N_{T}}.}}} & (54) \end{matrix}$

Hence, one has related flucutations in X−N to fluctuations in H. If it is the case that with probability at most (αT²/D)² that τ_(T) ^(H)(H_(Z)) is measured to be greater than αT²N, then since ∥H_(Z)∥≤DN, it follows that √{square root over (

τ_(T) ^(H)(H_(Z) ²)

₊)}=O(αT²N), and so

$\begin{matrix} {\sqrt{{\langle{\tau_{T}^{H}\left( \left( {X - N} \right)^{2} \right)}\rangle}_{+}} = {{O\left( {{\frac{\alpha^{2}T^{2}}{D}N} + {\frac{\alpha}{D}\sqrt{N_{T}}}} \right)}.}} & (55) \end{matrix}$

In the limit of large N, the quantity √{square root over (N_(T))} is asymptotically only √{square root over (N)} and so is negligible compared to the leading term.

E. Large X Expectation Value in Duality

The quantum algorithm when α²T²/D<<1 is now considered, e.g., when the expectation value of X in item 2 is close to 1.

First, a simple mean-field treatment is given: consider some Hamiltonian of degree K that will be called H₀ that is diagonal in the Z-basis. Suppose one wishes to maximize the expectation value of H₀ over states with given expectation value of X. If no constraint were placed on the expectation value of X, then one can maximize H_(Z) by choosing some state in the computational basis. For each spin i, this state has some expectation value

Z_(i)

=z_(i) with z_(i)∈{−1, +1}. If one wishes to obtain a nonzero expectation value of X, then a simple way is to take a product state, where each spin has z,142 X_(i)

=cos(θ) and

Z_(i)

=z_(i)sin(θ), for some angle θ. For θ=π/2, one can recover the classical state. At small θ, the expectation value of H₀ is proportional to θ^(K), while the expectation value of 1−X_(i) is proportional to θ². Thus, for K>2, the expectation value of H₀ drops more rapidly as a function of θ than does the expectation value of 1−X_(i).

A similar mean-field treatment might be applied to a Hamiltonian H₀ that includes both Y and Z operators: given any product solution of H₀ with

Z_(i)

=z_(i) and

Y_(i)

=y_(i) with z_(i) ²+y_(i) ²=1, one can define a product state with

X_(i)

=cos(θ) and

Z_(i)

=sin(θ) and

Y_(i)

=y_(i)sin(θ).

If this mean-field procedure were the best possible then, combined with theorem 3 taking H₀ to be the Hamiltonian of item 2, one would have a very favorable situation for the quantum algorithm: one would have (for small θ) the scaling θ²˜α²T²/D and while the expectation value of H₀ would be at most θ^(K) times the optimal value of H₀. Call this optimal value H₀ ^(max). Then for case 2 to apply, one would need α˜θ^(K)H₀ ^(max) while α²T²/D˜θ². So one would have (αT²)˜Dθ^(2−K)/H₀ ^(max). Taking, at the most optimistic situation, θ˜1/√{square root over (D)} (since for smaller θ the expectation value of X_(i) is within 1/D of 1 and certainly the mean-field is not accurate here), one would find that case 1 holds unless H₀ ^(max)˜N(αT²)⁻¹D^(K/2). For the case K=2, this is the same guarantee as before, but for K=4 or larger, this is much stronger guarantee.

This mean-field procedure will break down, so one cannot give such a strong guarantee for the quantum algorithm. However, in this subsection, it will be shown that some guarantees are still present in the quantum algorithm. These guarantees are expressed in terms of a semi-definite programming optimization problem.

Here, the case K=4 for definiteness is fixed from now on. Also, consider from here on the dense case, where N_(T)˜N^(K). The semi-definite programming problem that is given will still be relevant even if the dense case is not considered; however, in the dense case some additional interesting bounds can be given.

The dense case was studied previously, where it was shown that one can in general improve upon a random assignment by an amount proportional to √{square root over (N_(T))}. For K=4, this means that one can achieve

H^(Z)

˜N² in the worst case. This is interesting as the problem has degree D˜N³ and so the improvement over random even in the worst case is by much more than N_(T)/D.

In fact, the classical algorithm above will typically also achieve a value at least proportional to N² for all instances. To see this, note that in case 3 of theorem 1, the polynomial p(x) in the proof will have coefficient a₁ of order N√{square root over (D)} but the second order coefficient a₂ will be much smaller than N D. Indeed, for any given pair of sites i, j ∈S, the coefficient of ({right arrow over (w)}₁)_(i)({right arrow over (w)}₁)_(j) will be a weighted sum of terms ({right arrow over (w)}₂)_(k)({right arrow over (w)}₂)_(t) for k, l ∉S. While there are N² terms in the sum, for random choice of {right arrow over (w)}₂, the magnitude of this term will typically only be of order N. So, one will typically have a₂ of order N². Similar bounds can be given on higher degree coefficients of the poynomial. So, for x˜a₁/|a₂|, one will have p(x)˜a₁ ²/a₂˜N D/N²˜N² as claimed.

The quantum algorithm will also typically achieve a value at least proportional to N² for all instances. To see this, consider any Hamiltonian H₀ which is a sum of terms of degree K, all with coefficient +1. Assume that H₀ is diagonal in the Z basis (a similar calculation arises for any H₀ which includes terms in both Y and Z basis). It is desirable to optimize H₀ at given expectation value of X. One can claim that for any such choice of H₀, the optimum is bounded by the optimum where all coefficients in H₀ are +1. To see this, work in an eigenbasis of X_(i). Then, H₀ becomes an off-diagonal operator; for any wavefunction optimizing any given choice of H₀, if one takes the absolute value of that wavefunction (e.g., in the eigenbasis of X_(i), one replaces each coefficient with its absolute value), one obtains at least as large an eigenvalue for the case where all coefficients are +1. Indeed, one obtains at least as large a value if one assumes that all coefficients are +1 and non-vanishing. However, if all coefficients in H₀ are +1, this is a soluble problem: one must optimize X+Z⁴. Optimizing this when the expectation value of X is N−1, one finds that the optimum is ˜N². This indeed is what one expects from mean-field theory for this problem. Note that here rather than taking expectation value of 1−X_(i)˜1/D, one instead takes the expectation value of 1−X_(i)˜1/N in the dense case.

At this point a slight change of notation makes things clearer in the dense case. Let one define β so that

$\frac{\beta}{\sqrt{N_{T}}} = {\frac{\alpha}{D}.}$

Then, one can find that case 2 of theorem 2 has expectation value N−X of order β²T², while the expectation value of H₀ is of order β√{square root over (N_(T))} and in case 1 the expectation value of H_(Z) is of order βT²√{square root over (N_(T))}. Choosing β²T²=1 and choosing β slightly larger than 1, one sees that case 2 cannot then occur so case 1 must occur so indeed one gets an expectation value of H_(Z) of order √{square root over (N_(T))}.

Now, more general instances are considered and a bound is given on the expectation value of H₀ in general.

First, the case of K=2 is considered simply to fix notation and the general framework is given. The case of k=4 is then considered. It should be understood that generalization to higher cases is possible and within the scope of the disclosed technology.

For each site i, define operator b_(i) ^(†)=(|−

+|)_(i). That is, it has a nonzero matrix element only from the |+

state on i to the |−)

state. One has b_(i)b_(i) ^(†)=(1+X_(i))2 and b_(i)b_(i) ^(†)=(1−X_(i))/2. For any quantum state (possibly mixed) ρ define a 2N-by-2N matrix M of correlation functions. This matrix will have a block form

$\begin{matrix} {{M = \begin{pmatrix} M_{++} & M_{+ -} \\ M_{- +} & M_{--} \end{pmatrix}},} & (56) \end{matrix}$

where M₊₊ has matrix elements

(M ₊₊)_(ij) =tr(ρb _(i) ^(†) b _(j)),   (57)

and M⁺⁻ has matrix elements

(M ⁺⁻)_(ij) =tr(ρb _(i) b _(j)),   (58)

M⁻⁻ has matrix elements

(M ⁻⁻)_(ij) =tr(ρb _(i) b _(j) ^(†)).   (59)

One can set M⁺⁻=M⁻⁺ ^(†) and the matrix M₊₊ and M⁻⁻ are Hermitian. Note then that the off-diagonal elements of M₊₊ and M⁻⁻ are related by

(M ₊₊)_(ij)=⁻(M ⁻⁻)_(ij)

for i≠j but

(M ₊₊)_(ii)=1−(M ⁻⁻)_(i) i.

Equivalently, given an 2N component vector {right arrow over (a)}, one can define an operator O({right arrow over (a)}) by

${O\left( \overset{\rightarrow}{a} \right)} = {{\sum\limits_{i = 1}^{N}{b_{i}^{\dagger}a_{i}}} + {\sum\limits_{i = 1}^{N}{b_{i}{a_{i + N}.}}}}$

Then, M is such that

tr(ρO({right arrow over (a)} ₁)^(†) O({right arrow over (a)} ₂))={right arrow over (a)} ₁ ^(†) ·M·{right arrow over (a)} ₂.   (60)

So, M is the matrix of correlations functions of a 2N component vector containing operators b^(†) and b.

Then, since for any {right arrow over (a)} and any ρ, one has tr(ρO({right arrow over (a)})^(†)O({right arrow over (a)}))≥0, it follows that M is a positive semi-definite matrix. Further, one has

$\begin{matrix} {{{tr}\left( M_{++} \right)} = {{{tr}\left( {\rho \frac{N - X}{2}} \right)}.}} & (61) \end{matrix}$

One can construct a similar semi-definite programming bound in the case K=4 or larger. Now one can construct a matrix M which is a (2N)^(K/2)-by-(2N)^(K/2)) matrix. Let one focus on the case K=4. To define this matrix M, given any (2N)² component vector {right arrow over (a)}, label the components by a pair (i, j) each ranging from 1 to 2N. Then define

${O\left( \overset{\rightarrow}{a} \right)} = {\sum\limits_{i = 1}^{N}{\sum\limits_{j = 1}^{N}{\left( {{{\overset{\rightarrow}{a}}_{i,j}b_{i}^{\dagger}b_{j}^{\dagger}} + {{\overset{\rightarrow}{a}}_{{i + N},j}b_{i}^{\dagger}b_{j}^{\dagger}} + {{\overset{\rightarrow}{a}}_{i,{j + N}}b_{i}^{\dagger}b_{j}^{\dagger}} + {{\overset{\rightarrow}{a}}_{{i + N},{j + N}}b_{i}^{\dagger}b_{j}^{\dagger}}} \right).}}}$

Again the matrix M is positive semi-definite. One can again write M with a block structure

$\begin{matrix} {M = {\begin{pmatrix} M_{++{;++}} & M_{++{;{+ -}}} & M_{++{;{- +}}} & M_{++{;--}} \\ M_{{+ -};++} & M_{{+ -};{+ -}} & M_{{+ -};{- +}} & M_{{+ -};--} \\ M_{{- +};++} & M_{{- +};{+ -}} & M_{{- +};{- +}} & M_{{- +};--} \\ M_{--{;++}} & M_{--{;{+ -}}} & M_{--{;{- +}}} & M_{--{;--}} \end{pmatrix}.}} & (62) \end{matrix}$

The diagonal entries of M can be bounded in terms of the second moment of N−X This gives a semi-definite programming relaxation that allows one to bound the expectation value of H₀ at large X.

VII. Discussion

The embodiments disclosed include an algorithm that uses quantum quenches as well as a classical algorithm to perform approximate optimization. It was also proven that both of these algorithms obtain a result that improves upon the random appraoch by an amount that is more than N/D unless a related problem has a “very bad” solution. This can be used then in some cases to guarantee that the algorithm will find a nontrivial improvement if no such solution of the related problem exists. Additional guarantees can be given for the quantum algorithm.

The example quench algorithm is not described by a fixed depth quantum circuit, independent of D. The Lieb-Robinson velocity v_(LR) of this Hamiltonian is proportional to √{square root over (α)}, as can be shown by using Lieb-Robinson bounds adapted to Hamiltonians where the Hamiltonians is a sum of two types of terms (in this case, X_(j) for different qubits j is one type and terms in H_(Z) is another type) such that terms within a type commute; more generally, one can use bounds adapted to the case of a bounded commutator. To define the Lieb-Robinson velocity, one can define a distance between qubits by using a graph metric for a graph with vertices corresponding to qubits and an edge between vertices if the corresponding qubits are both in some term in H_(Z).

The estimates using the Lieb-Robinson velocity give some upper bound on how far a perturbation can propagate in a given time; the effect of a perturbation beyond a distance proportional to v_(LR)t is negligible. These estimates may not be tight, but it is to be expected that indeed the velocity of perturbations will be proportional to √{square root over (α)} in many systems. If this is true, then if αt² diverges with D to obtain a nontrivial approximation, the necessary circuit depth also diverges.

VIII. General Embodiments

In this section, example methods for performing aspects of the disclosed embodiments are disclosed. The particular embodiments described should not be construed as limiting, as the disclosed method acts can be performed alone, in different orders, or at least partially simultaneously with one another. Further, any of the disclosed methods or method acts can be performed with any other methods or method acts disclosed herein.

FIG. 5 is an example method for performing an approximate optimization technique using a quantum quench algorithm as disclosed herein.

At 510, a quantum computing device is configured to perform an approximate optimization technique to approximate a solution to a combinatorial optimization problem.

At 512, the approximate optimization technique is performed on the quantum computing device, wherein the approximate optimization technique includes using a quantum quench algorithm.

In certain implementations, the quench algorithm includes averaging state values over a plurality of times. In particular implementations, the controlling comprises changing coupling constants without using a jump or slow change in the coupling constants. In some examples, the changing of the coupling constants is performed non-adiabatically. In certain examples, the changing of the coupling constants is followed by an equilibration time.

In some implementations, the method further comprises reading out results of the approximate optimization technique from the quantum computing device; and storing the results in a classical computing device.

FIG. 6 is another example method for performing an approximate optimization technique using a quantum quench algorithm as disclosed herein.

At 610, a quantum quench algorithm is performed on a classical computing device using simulation of a quantum Hamiltonian to perform approximate optimization. In certain implementations, the quench algorithm includes averaging state values over a plurality of times.

IX. Example Computing Environments

FIG. 1 illustrates a generalized example of a suitable classical computing environment 100 in which several of the described embodiments can be implemented. The computing environment 100 is not intended to suggest any limitation as to the scope of use or functionality of the disclosed technology, as the techniques and tools described herein can be implemented in diverse general-purpose or special-purpose environments that have computing hardware.

With reference to FIG. 1, the computing environment 100 includes at least one processing device 110 and memory 120. In FIG. 1, this most basic configuration 130 is included within a dashed line. The processing device 110 (e.g., a CPU or microprocessor) executes computer-executable instructions. In a multi-processing system, multiple processing devices execute computer-executable instructions to increase processing power. The memory 120 may be volatile memory (e.g., registers, cache, RAM, DRAM, SRAM), non-volatile memory (e.g., ROM, EEPROM, flash memory), or some combination of the two. The memory 120 stores software 180 implementing tools for peforming any of the approximation methods disclosed herein for addressing combinatorial optimization problems, using a classical computer and/or quantum computer. For example, the memory 120 can store software for controlling a quantum circuit to implement an embodiment of the disclosed approximation technique. The memory 120 can also store software 180 for synthesizing, generating (or compiling), and/or controlling quantum circuits as described herein.

The computing environment can have additional features. For example, the computing environment 100 includes storage 140, one or more input devices 150, one or more output devices 160, and one or more communication connections 170. An interconnection mechanism (not shown), such as a bus, controller, or network, interconnects the components of the computing environment 100. Typically, operating system software (not shown) provides an operating environment for other software executing in the computing environment 100, and coordinates activities of the components of the computing environment 100.

The storage 140 can be removable or non-removable, and includes one or more magnetic disks (e.g., hard drives), solid state drives (e.g., flash drives), magnetic tapes or cassettes, CD-ROMs, DVDs, or any other tangible non-volatile storage medium which can be used to store information and which can be accessed within the computing environment 100. The storage 140 can also store instructions for the software 180 implementing tools for peforming any of the approximation methods disclosed herein for addressing combinatorial optimization problems, using a classical computer and/or quantum computer. For example, the memory 120 can store software for controlling a quantum circuit to implement an embodiment of the disclosed approximation technique. The storage 140 can also store instructions for the software 180 for synthesizing, generating (or compiling), and/or controlling quantum circuits as described herein.

The input device(s) 150 can be a touch input device such as a keyboard, touchscreen, mouse, pen, trackball, a voice input device, a scanning device, or another device that provides input to the computing environment 100. The output device(s) 160 can be a display device (e.g., a computer monitor, laptop display, smartphone display, tablet display, netbook display, or touchscreen), printer, speaker, or another device that provides output from the computing environment 100.

The communication connection(s) 170 enable communication over a communication medium to another computing entity. The communication medium conveys information such as computer-executable instructions or other data in a modulated data signal. A modulated data signal is a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media include wired or wireless techniques implemented with an electrical, optical, RF, infrared, acoustic, or other carrier.

As noted, the various methods, quantum circuit control techqniues, or compilation/synthesis techniques can be described in the general context of computer readable instructions stored on one or more computer-readable media. Computer-readable media are any available media (e.g., memory or storage device) that can be accessed within or by a computing environment. Computer-readable media include tangible computer-readable memory or storage devices such as memory 120 and/or storage 140, and do not include propagating carrier waves or signals per se (tangible computer-readable memory or storage devices do not include propagating carrier waves or signals per se).

Various embodiments of the methods disclosed herein can also be described in the general context of computer-executable instructions (such as those included in program modules) being executed in a computing environment by a processor. Generally, program modules include routines, programs, libraries, objects, classes, components, data structures, and so on, that perform particular tasks or implement particular abstract data types. The functionality of the program modules may be combined or split between program modules as desired in various embodiments. Computer-executable instructions for program modules may be executed within a local or distributed computing environment.

An example of a possible network topology 200 (e.g., a client-server network) for implementing a system according to the disclosed technology is depicted in FIG. 2. Networked computing device 220 can be, for example, a computer running a browser or other software connected to a network 212. The computing device 220 can have a computer architecture as shown in FIG. 1 and discussed above. The computing device 220 is not limited to a traditional personal computer but can comprise other computing hardware configured to connect to and communicate with a network 212 (e.g., smart phones, laptop computers, tablet computers, or other mobile computing devices, servers, network devices, dedicated devices, and the like). Further, the computing device 220 can comprise an FPGA or other programmable logic device. In the illustrated embodiment, the computing device 220 is configured to communicate with a computing device 230 (e.g., a remote server, such as a server in a cloud computing environment) via a network 212. In the illustrated embodiment, the computing device 220 is configured to transmit input data to the computing device 230, and the computing device 230 is configured to implement a quantum circuit control technique according to any of the disclosed embodiments and/or a circuit generation or compilation/synthesis methods for generating qunatum circuits for use with any of the techniques disclosed herein. The computing device 230 can output results to the computing device 220. Any of the data received from the computing device 230 can be stored or displayed on the computing device 220 (e.g., displayed as data on a graphical user interface or web page at the computing devices 220). In the illustrated embodiment, the illustrated network 212 can be implemented as a Local Area Network (LAN) using wired networking (e.g., the Ethernet IEEE standard 802.3 or other appropriate standard) or wireless networking (e.g. one of the IEEE standards 802.11a, 802.11b, 802.11g, or 802.11n or other appropriate standard). Alternatively, at least part of the network 212 can be the Internet or a similar public network and operate using an appropriate protocol (e.g., the HTTP protocol).

Another example of a possible network topology 300 (e.g., a distributed computing environment) for implementing a system according to the disclosed technology is depicted in FIG. 3. Networked computing device 320 can be, for example, a computer running a browser or other software connected to a network 312. The computing device 320 can have a computer architecture as shown in FIG. 1 and discussed above. In the illustrated embodiment, the computing device 320 is configured to communicate with multiple computing devices 330, 331, 332 (e.g., remote servers or other distributed computing devices, such as one or more servers in a cloud computing environment) via the network 312. In the illustrated embodiment, each of the computing devices 330, 331, 332 in the computing environment 300 is used to perform at least a portion of a quantum circuit control technique according to any of the disclosed embodiments and/or a circuit generation or compilation/synthesis methods for generating quantum circuits for use with any of the techniques disclosed herein. In other words, the computing devices 330, 331, 332 form a distributed computing environment in which the quantum circuit control and/or generation/compilation/synthesis processes are shared across multiple computing devices. The computing device 320 is configured to transmit input data to the computing devices 330, 331, 332, which are configured to distributively implement such as process, including performance of any of the disclosed methods or creation of any of the disclosed circuits, and to provide results to the computing device 320. Any of the data received from the computing devices 330, 331, 332 can be stored or displayed on the computing device 320 (e.g., displayed as data on a graphical user interface or web page at the computing devices 320). The illustrated network 312 can be any of the networks discussed above with respect to FIG. 2.

With reference to FIG. 4, an exemplary system for implementing the disclosed technology includes computing environment 400. In computing environment 400, a compiled quantum computer circuit description (including quantum circuits for performing any of the disclosed approximation techniques) can be used to program (or configure) one or more quantum processing units such that the quantum processing unit(s) implement the circuit described by the quantum computer circuit description.

The environment 400 includes one or more quantum processing units 402 and one or more readout device(s) 408. The quantum processing unit(s) execute quantum circuits that are precompiled and described by the quantum computer circuit description. The quantum processing unit(s) can be one or more of, but are not limited to: (a) superconducting quantum computer; (b) an ion trap quantum computer; (c) a fault-tolerant architecture for quantum computing; and/or (d) a topological quantum architecture (e.g., a topological quantum computing device using Majorana zero modes). The precompiled quantum circuits, including any of the disclosed circuits, can be sent into (or otherwise applied to) the quantum processing unit(s) via control lines 406 at the control of quantum processor controller 420. The quantum processor controller (QP controller) 420 can operate in conjunction with a classical processor 410 (e.g., having an architecture as described above with respect to FIG. 1) to implement the desired quantum computing process. In the illustrated example, the QP controller 420 further implements the desired quantum computing process via one or more QP subcontrollers 404 that are specially adapted to control a corresponding one of the quantum processor(s) 402. For instance, in one example, the quantum controller 420 facilitates implementation of the compiled quantum circuit by sending instructions to one or more memories (e.g.; lower-temperature memories), which then pass the instructions to low-temperature control. unit(s) (e.g., QP subcontroller(s) 404) that transmit, for instance, pulse sequences representing the gates to the quantum processing unit(s) 402 for implementation. In other examples, the QP controller(s) 420 and QP subcontroller(s) 404 operate to provide appropriate magnetic fields, encoded operations, or other such control signals to the quantum processor(s) to implement the operations of the compiled quantum computer circuit description. The quantum controller(s) can further interact with readout devices 408 to help control and implement the desired quantum computing process (e.g., by reading or measuring out data results from the quantum processing units once available, etc.)

With reference to FIG. 4, compilation is the process of translating a high-level description of a quantum algorithm into a quantum computer circuit description comprising a sequence of quantum operations or gates, which can include the circuits as disclosed herein. The compilation can be performed by a compiler 422 using a classical processor 410 (e.g., as shown in FIG. 1) of the environment 400 which loads the high-level description from memory or storage devices 412 and stores the resulting quantum computer circuit description in the memory or storage devices 412.

In other embodiments, compilation and/or verification can be performed remotely by a remote computer 460 (e.g., a computer having a computing environment as described above with respect to FIG. 1) which stores the resulting quantum computer circuit description in one or more memory or storage devices 462 and transmits the quantum computer circuit description to the computing environment 400 for implementation in the quantum processing unit(s) 402. Still further, the remote computer 400 can store the high-level description in the memory or storage devices 462 and transmit the high-level description to the computing environment 400 for compilation and use with the quantum processor(s). In any of these scenarios, results from the computation performed by the quantum processor(s) can be communicated to the remote computer after and/or during the computation process. Still further, the remote computer can communicate with the QP controller(s) 420 such that the quantum computing process (including any compilation, verification, and QP control procedures) can be remotely controlled by the remote computer 400. In general, the remote computer 460 communicates with the QP controller(s) 420, compiler/synthesizer 422, and/or verification tool 423 via communication connections 450.

In particular embodiments, the environment 400 can be a cloud computing environment, which provides the quantum processing resources of the environment 400 to one or more remote computers (such as remote computer 460) over a suitable network (which can include the internet).

X. Concluding Remarks

Having described and illustrated the principles of the disclosed technology with reference to the illustrated embodiments, it will be recognized that the illustrated embodiments can be modified in arrangement and detail without departing from such principles. For instance, elements of the illustrated embodiments shown in software may be implemented in hardware and vice-versa. Also, the technologies from any example can be combined with the technologies described in any one or more of the other examples. It will be appreciated that procedures and functions such as those described with reference to the illustrated examples can be implemented in a single hardware or software module, or separate modules can be provided. The particular arrangements above are provided for convenient illustration, and other arrangements can be used. 

What is claimed is:
 1. A method, comprising: configuring a quantum computing device to perform an approximate optimization technique to approximate a solution to a combinatorial optimization problem; and performing the approximate optimization technique on the quantum computing device, wherein the approximate optimization technique includes using a quantum quench algorithm.
 2. The method of claim 1, wherein the quench algorithm includes averaging state values over a plurality of times.
 3. The method of claim 1, wherein the controlling comprises: changing coupling constants without using a jump or slow change in the coupling constants.
 4. The method of claim 3, wherein the changing coupling constants is performed non-adiabatically.
 5. The method of claim 3, wherein the changing of the coupling constants is followed by an equilibration time.
 6. The method of claim 1, further comprising: reading out results of the approximate optimization technique from the quantum computing device; and storing the results in a classical computing device.
 7. A system, comprising: a classical computer; and a quantum computer, wherein the quantum computing device is configured to perform an approximate optimization technique on the quantum computing device to approximate a solution to a combinatorial optimization problem, wherein the approximate optimization technique includes using a quantum quench algorithm.
 8. The system of claim 7, wherein the quench algorithm includes averaging state values over a plurality of times.
 9. The system of claim 7, wherein the approximate optimization technique comprises changing coupling constants without using a jump or slow change in the coupling constants.
 10. The system of claim 9, wherein the changing coupling constants is performed non-adiabatically.
 11. The system of claim 9, wherein the changing of the coupling constants is followed by an equilibration time.
 12. The system of claim 9, wherein the classical computer is further configured to: read out results of the approximate optimization technique from the quantum computing device; and store the results in a classical computing device.
 13. One or more computer-readable media storing computer-exectuable instructions, which when executed by a classical computing device, cause the classical computing device to perform a method, the method comprising: configuring a quantum computing device to perform an approximate optimization technique to approximate a solution to a combinatorial optimization problem; and performing the approximate optimization technique on the quantum computing device, wherein the approximate optimization technique includes using a quantum quench algorithm.
 14. The one or more computer-readable media of claim 13, wherein the quench algorithm includes averaging state values over a plurality of times.
 15. The one or more computer-readable media of claim 13, wherein the controlling comprises: changing coupling constants without using a jump or slow change in the coupling constants.
 16. The one or more computer-readable media of claim 15, wherein the changing coupling constants is performed non-adiabatically.
 17. The one or more computer-readable media of claim 15, wherein the changing of the coupling constants is followed by an equilibration time.
 18. The one or more computer-readable media of claim 13, further comprising: reading out results of the approximate optimization technique from the quantum computing device; and storing the results.
 19. A method, comprising performing, on a classical computing device, a quantum quench algorithm using simulation of a quantum Hamiltonian to perform approximate optimization.
 20. The method of claim 19, wherein the quench algorithm includes averaging state values over a plurality of times. 